feat: add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operators for qwen3.5/qwen3-next.#1262
Open
fems14 wants to merge 1 commit intojd-opensource:mainfrom
Open
feat: add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operators for qwen3.5/qwen3-next.#1262fems14 wants to merge 1 commit intojd-opensource:mainfrom
fems14 wants to merge 1 commit intojd-opensource:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the Qwen3GatedDeltaNetBase implementation to utilize new NPU-specific kernels for chunked and recurrent gated-delta attention and adds cumulative sequence length calculation to the attention metadata. However, several critical issues need to be addressed: the initial state fetched from the SSM cache is being incorrectly zeroed out, which breaks statefulness; the recurrent state is transposed before being stored back in the cache, leading to a layout mismatch; and the removal of head repetition logic combined with the use of squeeze(0) on tensors could result in shape and dimension errors.
54fd624 to
44ca51d
Compare
yingxudeng
reviewed
Apr 16, 2026
zhang-minchao
previously approved these changes
Apr 20, 2026
yingxudeng
previously approved these changes
Apr 20, 2026
yingxudeng
reviewed
Apr 21, 2026
|
|
||
| #include <tuple> | ||
|
|
||
| #include "torch_npu/csrc/aten/CustomFunctions.h" |
Collaborator
There was a problem hiding this comment.
这个改动有必要吗? 如果cicd过的话,就这样吧。
如果cicd没过还要改,这个是不是可以删掉
3616aaa
3616aaa to
a21c06b
Compare
JimHsiung
previously approved these changes
Apr 21, 2026
…le operations on NPU The main changes include: 1. Add the implementation file npu_recurrent_gated_delta_rule.cpp 2. Add function declarations in npu_ops_api.h 3. Add a generic interface in ops_api.h and ops_api.cpp 4. Update CMakeLists.txt to include the new source files 5. Integrate new operations in qwen3_gated_delta_net_base.cpp 6. Update submodule version
Collaborator
zhang-minchao
approved these changes
Apr 22, 2026
yingxudeng
approved these changes
Apr 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

qwe3.5/qwen3-next model add npu_recurrent_gated_delta_rule and chunk_gated_delta_rule fusion operater
需依赖算子先合入:https://gitcode.com/xLLM-AI/torch_npu_ops/pull/12